Skip to content

Conversation

LaurentAerens
Copy link

diff_prettyHtml currently doesn't highlight leading/trailing space differences. Because spaces in HTML aren't visible.

This pull request changes that.

Instead of

<ins style="background:#e6ffe6;"> </ins>

diff_prettyHtml now returns

<ins style="background:#e6ffe6;">&nbsp;</ins>

resulting in differences with leading/training spaces being visible in HTML.

Example:
When we compare "Test" and "Test " (notice the second string doesn't equal the first string since it has a trailing space).

var diff = dmp.diff_main("Test", "Test "); // this results in the following diff: [[0, 'Test'], [1, ' ']]

Previously looked like this:

242906142-5ebd2764-0129-49f0-b3c0-ec77319316f3

Now with this pull requests it looks like this:

242906167-f99add6a-79c2-42f5-8286-fa043903b9f5

Note: this issue also seems to be present in the JavaScript implementation: google#144

@dmsnell
Copy link
Owner

dmsnell commented Oct 14, 2025

@LaurentAerens it seems like this would be a valuable change. how do you feel about proposing it for all of the libraries so that the C# implementation doesn’t work differently than the others?

Further, and this could be a follow-up enhancement, but it seems like this could be moved into a function specific to transforming the display of the diffs for pretty HTML viewing. That might include other changes such as replacing C0 controls with their visual representations, e.g. display “␛” instead of displaying the raw byte 0x1B.

Having a separate function could ease the cross-language consistency and provide a good place to discuss different representations.

@LaurentAerens
Copy link
Author

I've updated it in all remaining languages.
Your idea of a different function for this sounds like a great idea and is something i will take up when i have time. maybe create an issue for it for now?

@LaurentAerens
Copy link
Author

i'm unsure about the merge conflicts, i don't see any so i'm a bit confused about what's going on.

@dmsnell
Copy link
Owner

dmsnell commented Oct 15, 2025

@LaurentAerens aha, we converted the JS library to an npm package and renamed the files. I can fix this for you if you want. otherwise the changes should apply to javascript/index.js

@LaurentAerens
Copy link
Author

@dmsnell i think i'm having an issue pulling the lastest version of main, so if you can do it for now that would be amazing. i'm going to investigate why i can't pull the lastest version.

@dmsnell
Copy link
Owner

dmsnell commented Oct 15, 2025

@LaurentAerens I have pushed eb4a437. You should be able to git reset --hard eb4a43780b30e6d008ec02689ae182bf72973290 and the force-push that to this PR’s branch.

in review I verified that browsers are going to render &Tab; as a space U+0020 character, and render multiple &#x20; character references as a single space — in other words, perform whitespace normalization after decoding, which I thought was wrong. the motivation for this patch stands.

to that end, however, I think there’s a potential issue with this patch in that it might prevent word-wrapping for diff elements with long streams of text. can you try viewing a pretty diff in a browser where there’s a sufficiently long span of inserted/deleted/unchanged text and see if it wraps?

if we have that problem it may be worth it to use to limit the replacement of the tabs and spaces with those only on the leading or trailing end of a string, something that I would think should be accomplishable with a tiny change to the regex patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants